details widget name

Overview

Chapter details

Introduction

Automatic summary generation, as summarization in general, is focused on the creation of a shortened version of a text, without losing the general meaning. The difference is that the automatic summarization aims at generating a meaningful summary without human intervention. Typically there are two basic steps in summarization: first, the most semantically important sentences should be located and extracted; and second, these sentences should be combined in a logically connected text, without disruptions. Based on these two steps, there are two basic types of automatic summarization: extractive and abstractive.

Extractive summary

Extractive summary generation is focused on recognizing the most important sentences and extracting them from the text, without paraphrasing them in a coherent text. This approach is the less complex one, since the algorithms need to find the semantically important sentences.

Abstractive summary

On the other hand, abstractive automatic summarization deals with the restating and gluing the extracted sentences into coherent summarized text. The reason this approach is significantly more complex is because it involves natural language generation, which encompasses lots of NLP research areas, like: anaphora resolution, discourse analysis, named entity recognition, sentence boundary disambiguation, entailment, and others. Many of these sub-research areas are utilized in the extractive summarization, in order to provide fine-tuned algorithms for sentence selection.

Baseline approach

The chosen implementation approach for coherent text summarization combines the well-known LexRank algorithm (Erkan, Radev, 2004) with semantic graphs and word-sense disambiguation techniques (Palaza, Diaz, 2011). Furthermore, we have automatically built thesauri for the top-level domains in order to produce domain-focused extractive summaries. Finally, we apply clause-boundaries splitting in order to truncate the irrelevant or subordinating clauses in the sentences in the summary.The implementation of the summarization engine is in a prototyping phase.